216 ◾ Bioinformatics
of the ChIP-Seq signal may vary depending on the binding protein studied. The ChIP-Seq
signal can be sharp, broad, or a mix of sharp and broad signal. The sharp signal character-
izes the binding site of the TF which binds to a specific site in the DNA sequence called
motif. Histones form broad ChIP-Seq signals because they span several nucleosomes and
may cover several nucleotides on the DNA. The RNA polymerase II (Pol II) initiates the
process of transcription by localizing on the promoter region of the gene and then it moves
during the messenger RNA transcription. Therefore, the ChIP-Seq signal of Poly II may
include both sharp and broad signals (Figure 6.1).
Peak-calling programs use sliding windows to scan the genome for these patterns to
locate the binding regions by counting both Watson and Crick tags. However, for these
kinds of tags to fit in a single window, they must be shifted to the center so that Watson tags
are shifted toward the 3′ end and Crick tags are shifted toward the 5′ end to form a peak in
the putative binding site. Peak-calling programs like MACS take advantage of the bimodal
pattern to empirically model the shifting size to precisely locate the binding sites [2] on the
genomic DNA sequences.
Peak calling is a step unique to ChIP-Seq data analysis and it aims to identify the
genomic regions occupied by the protein of interest and enriched due to the ChIP. The
abundance of the aligned reads normalized by input reads in a sliding window is the basis
of the peak calling, which is performed using statistics that determine peak significance.
The ChIP-Seq tags are usually normalized by input read (control), but some peak callers
can also call peaks without using input reads. Instead, they assume even background signal
D
FIGURE 6.2 ChIP-Seq read alignment. (The peaks represent reads aligned to the reference
genome.)
FIGURE 6.1 Sharp signal (TF), broad signal (histones), and mixed signal (Poly II).